AITopics | validation technique

Collaborating Authors

validation technique

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Stabilizing Machine Learning for Reproducible and Explainable Results: A Novel Validation Approach to Subject-Specific Insights

Vos, Gideon, van Eijk, Liza, Sarnyai, Zoltan, Azghadi, Mostafa Rahimi

arXiv.org Machine LearningDec-16-2024

Machine Learning is transforming medical research by improving diagnostic accuracy and personalizing treatments. General ML models trained on large datasets identify broad patterns across populations, but their effectiveness is often limited by the diversity of human biology. This has led to interest in subject-specific models that use individual data for more precise predictions. However, these models are costly and challenging to develop. To address this, we propose a novel validation approach that uses a general ML model to ensure reproducible performance and robust feature importance analysis at both group and subject-specific levels. We tested a single Random Forest (RF) model on nine datasets varying in domain, sample size, and demographics. Different validation techniques were applied to evaluate accuracy and feature importance consistency. To introduce variability, we performed up to 400 trials per subject, randomly seeding the ML algorithm for each trial. This generated 400 feature sets per subject, from which we identified top subject-specific features. A group-specific feature importance set was then derived from all subject-specific results. We compared our approach to conventional validation methods in terms of performance and feature importance consistency. Our repeated trials approach, with random seed variation, consistently identified key features at the subject level and improved group-level feature importance analysis using a single general model. Subject-specific models address biological variability but are resource-intensive. Our novel validation technique provides consistent feature importance and improved accuracy within a general ML model, offering a practical and explainable alternative for clinical research.

artificial intelligence, feature importance, machine learning, (15 more...)

arXiv.org Machine Learning

2412.16199

Country:

Europe > Austria > Vienna (0.14)
Oceania > Australia (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.95)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.94)

Add feedback

A Simple Introduction to Validating and Testing a Model- Part 1

#artificialintelligenceSep-5-2020, 12:40:38 GMT

The issues related to the Hold-out validation technique are solved in this technique. Here we will make sure that each set has got similar distribution which will eventually help us generate a better model. Now that we know what these two techniques are, let's have a look at the code We will be using python 3.0 Here df will now have the dataset that we want to use. We can see that the data has got 5 rows and 25 columns, where Survived is our target(dependent) variable and the rest are the independent variables.

artificial intelligence, machine learning, validating and testing, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fifty new planets confirmed in machine learning first

#artificialintelligenceAug-30-2020, 14:46:37 GMT

Fifty potential planets have had their existence confirmed by a new machine learning algorithm developed by University of Warwick scientists. For the first time, astronomers have used a process based on machine learning, a form of artificial intelligence, to analyse a sample of potential planets and determine which ones are real and which are'fakes', or false positives, calculating the probability of each candidate to be a true planet. Their results are reported in a new study published in the Monthly Notices of the Royal Astronomical Society, where they also perform the first large scale comparison of such planet validation techniques. Their conclusions make the case for using multiple validation techniques, including their machine learning algorithm, when statistically confirming future exoplanet discoveries. Many exoplanet surveys search through huge amounts of data from telescopes for the signs of planets passing between the telescope and their star, known as transiting. This results in a telltale dip in light from the star that the telescope detects, but it could also be caused by a binary star system, interference from an object in the background, or even slight errors in the camera.

artificial intelligence, machine learning, planet, (15 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.50)
Press Release (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fifty new planets confirmed in machine learning first

#artificialintelligenceAug-29-2020, 00:32:26 GMT

For the first time, astronomers have used a process based on machine learning, a form of artificial intelligence, to analyse a sample of potential planets and determine which ones are real and which are'fakes', or false positives, calculating the probability of each candidate to be a true planet. Their results are reported in a new study published in the Monthly Notices of the Royal Astronomical Society, where they also perform the first large scale comparison of such planet validation techniques. Their conclusions make the case for using multiple validation techniques, including their machine learning algorithm, when statistically confirming future exoplanet discoveries. Many exoplanet surveys search through huge amounts of data from telescopes for the signs of planets passing between the telescope and their star, known as transiting. This results in a telltale dip in light from the star that the telescope detects, but it could also be caused by a binary star system, interference from an object in the background, or even slight errors in the camera. These false positives can be sifted out in a planetary validation process.

artificial intelligence, machine learning, planet, (16 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

50 new planets confirmed in machine learning first

#artificialintelligenceAug-25-2020, 14:45:53 GMT

Fifty potential planets have been confirmed by a new machine learning algorithm developed by University of Warwick scientists. For the first time, astronomers have used a process based on machine learning, a form of artificial intelligence, to analyze a sample of potential planets and determine which ones are real and which are "fakes," or false positives, calculating the probability of each candidate to be a true planet. Their results are reported in a new study published in the Monthly Notices of the Royal Astronomical Society, where they also perform the first large scale comparison of such planet validation techniques. Their conclusions make the case for using multiple validation techniques, including their machine learning algorithm, when statistically confirming future exoplanet discoveries. Many exoplanet surveys search through huge amounts of data from telescopes for the signs of planets passing between the telescope and their star, known as transiting. This results in a telltale dip in light from the star that the telescope detects, but it could also be caused by a binary star system, interference from an object in the background, or even slight errors in the camera.

artificial intelligence, machine learning, planet, (16 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Validation techniques beyond K-fold

#artificialintelligenceJan-20-2020, 18:36:46 GMT

A validation dataset is a sample of data held back from training your model that is used to give an estimate of model skill while tuning the model's hyperparameters. The validation dataset is different from the test dataset that is also held back from the training of the model, but is instead used to give an unbiased estimate of the skill of the final tuned model when comparing or selecting between final models. There is much confusion in applied machine learning about what a validation dataset is exactly and how it differs from a test dataset. Validation techniques in machine learning are used to get the error rate of the ML model, which can be considered as close to the true error rate of the population. If the data volume is large enough to be representative of the population, you may not need the validation techniques.

dataset, error rate, validation technique, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.74)

Add feedback

Improve Your Model Performance using Cross Validation (in Python / R)

#artificialintelligenceOct-1-2018, 23:06:59 GMT

This article was originally published on November 18, 2015 and updated on April 30, 2018. One of the most interesting and challenging things about hackathons is getting a high score on both public and private leaderboards. I have closely monitored the series of Data Hackathons and found an interesting trend. This trend is based on participant rankings on the public and private leaderboards. One thing that stood out was that participants who rank higher on the public leaderboard lose their position after their ranks gets validated on the private leaderboard.

artificial intelligence, machine learning, validation, (16 more...)

#artificialintelligence

Genre: Contests & Prizes (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.54)

Add feedback

Optimizing Prediction Intervals by Tuning Random Forest via Meta-Validation

Bayley, Sean, Falessi, Davide

arXiv.org Machine LearningJan-22-2018

Recent studies have shown that tuning prediction models increases prediction accuracy and that Random Forest can be used to construct prediction intervals. However, to our best knowledge, no study has investigated the need to, and the manner in which one can, tune Random Forest for optimizing prediction intervals { this paper aims to fill this gap. We explore a tuning approach that combines an effectively exhaustive search with a validation technique on a single Random Forest parameter. This paper investigates which, out of eight validation techniques, are beneficial for tuning, i.e., which automatically choose a Random Forest configuration constructing prediction intervals that are reliable and with a smaller width than the default configuration. Additionally, we present and validate three meta-validation techniques to determine which are beneficial, i.e., those which automatically chose a beneficial validation technique. This study uses data from our industrial partner (Keymind Inc.) and the Tukutuku Research Project, related to post-release defect prediction and Web application effort estimation, respectively. Results from our study indicate that: i) the default configuration is frequently unreliable, ii) most of the validation techniques, including previously successfully adopted ones such as 50/50 holdout and bootstrap, are counterproductive in most of the cases, and iii) the 75/25 holdout meta-validation technique is always beneficial; i.e., it avoids the likely counterproductive effects of validation techniques.

artificial intelligence, configuration, machine learning, (15 more...)

arXiv.org Machine Learning

1801.07194

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
(2 more...)

Add feedback

Testing Machine Learning Algorithms with K-Fold Cross Validation - Talend

#artificialintelligenceNov-5-2017, 11:05:26 GMT

In an earlier post on Applying Machine Learning to IoT Sensors, I discussed the process for classifying sensor data with a machine learning algorithm. In this post, I'll give a background on choosing an algorithm, then using a validation technique. For the technique, I'll show how to apply it, and how it can be built using the Talend Studio without hand coding. Given a prediction scenario involving a machine learning algorithm, the first question to ask is what is the appropriate machine learning algorithm? Taking the example of predicting a user's activity based on mobile phone accelerometer data, we must be able to classify a category for the data (resting, walking, or running).

artificial intelligence, machine learning, training dataset, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.45)

Add feedback

Exploiting random projections and sparsity with random forests and gradient boosting methods -- Application to multi-label and multi-output learning, random forest model compression and leveraging input sparsity

Joly, Arnaud

arXiv.org Machine LearningApr-26-2017

Within machine learning, the supervised learning field aims at modeling the input-output relationship of a system, from past observations of its behavior. Decision trees characterize the input-output relationship through a series of nested $if-then-else$ questions, the testing nodes, leading to a set of predictions, the leaf nodes. Several of such trees are often combined together for state-of-the-art performance: random forest ensembles average the predictions of randomized decision trees trained independently in parallel, while tree boosting ensembles train decision trees sequentially to refine the predictions made by the previous ones. The emergence of new applications requires scalable supervised learning algorithms in terms of computational power and memory space with respect to the number of inputs, outputs, and observations without sacrificing accuracy. In this thesis, we identify three main areas where decision tree methods could be improved for which we provide and evaluate original algorithmic solutions: (i) learning over high dimensional output spaces, (ii) learning with large sample datasets and stringent memory constraints at prediction time and (iii) learning over high dimensional sparse input spaces.

artificial intelligence, computational statistic & data analysis, machine learning, (20 more...)

arXiv.org Machine Learning

1704.08067

Country:

Europe (1.00)
North America > United States > California (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.92)
Education (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(3 more...)

Add feedback